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ABSTRACT 

Biogenesis and molecular function are two key 
subjects in the field of microRNA (miRNA) 
research. Deep sequencing has become the princi- 
pal technique in cataloging of miRNA repertoire and 
generating expression profiles in an unbiased 
manner. Here, we describe the miRGator v3.0 
update (http://mirgator.kobic.re.kr) that compiled 
the deep sequencing miRNA data available in 
public and implemented several novel tools to facili- 
tate exploration of massive data. The miR-seq 
browser supports users to examine short read align- 
ment with the secondary structure and read count 
information available in concurrent windows. 
Features such as sequence editing, sorting, 
ordering, import and export of user data would 
be of great utility for studying iso-miRs, miRNA 
editing and modifications. miRNA-target relation 
is essential for understanding miRNA function. 
Coexpression analysis of miRNA and target 
mRNAs, based on miRNA-seq and RNA-seq data 
from the same sample, is visualized in the heat- 
map and network views where users can investigate 
the inverse correlation of gene expression and 
target relations, compiled from various databases 
of predicted and validated targets. By keeping 
datasets and analytic tools up-to-date, miRGator 
should continue to serve as an integrated resource 
for biogenesis and functional investigation of 
miRNAs. 



INTRODUCTION 

Over the past 2 years, the number of known microRNAs 
(miRNAs) in human has almost tripled (1). The catalog of 
miRNA information is usually deposited in databases 
such as miRBase (1) and PMRD (2). In miRNEST (3), 
novel miRNA candidates are predicted from expressed 
sequence tag (EST) sequences in various animals, plants 
and viruses. The miRNAs of related sequences are 
grouped as RNA family as in Rfam (4). 

Regarding miRNA targets, validated targets are still 
sparse but are available at miRecords (5), Tarbase (6) 
and miRTarBase (7). Many target prediction methods 
were developed including TargetScan (8), microRNA.org 
(9), miRBase (1), PITA (10), PicTar (11), miRDB (12) and 
their combinations (13). These programs usually suffer 
from a large number of false positives. Other tools 
that provide analytics functions based on miRNA and 
mRNA expression profiles include HOCTAR (14) and 
miRFANS (15). 

The biology of miRNAs is turning out to be much more 
complex than initially thought, where a single miRNA 
may have multiple isoforms (iso-miRs) and often 
undergo modifications such as 3'-nucleotide addition 
(16). Comprehensive profiling of such miRNA variants 
is necessary to understand the function of miRNAs in 
the context of various human diseases and other perturb- 
ations. Deep sequencing technique is rapidly replacing the 
hybridization-based methods due to its ability to catalog 
and quantify miRNAs (and their variants) in an unbiased 
and accurate manner. Accordingly, several web tools and 
databases, including deepBase (17), miRTools (18), 
miRanalyzer (19) and miRDeepFinder (20), were develo- 
ped to analyze the deep sequencing data. 



*To whom correspondence should be addressed. Tel: +82 232772888; Fax: +82 232773760; Email: sanghyuk@kribb.re.kr 
Correspondence may also be addressed to Wankyu Kim. Tel: +82 232774132; Fax: +82 232773760; Email: wkim@ewha.ac.kr 

The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors. 

© The Author(s) 2012. Published by Oxford University Press. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.Org/licenses/by-nc/3.0/), which 
permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact 
journals.permissions@oup.com. 



Nucleic Acids Research, 2013, Vol. 41, Database issue D253 



Even though deep sequencing has become the main 
driving force in uncovering novel miRNAs and expression 
changes, we still lack a comprehensive and integrated 
database of miRNA sequencing, expression profiling and 
targeting information, implemented with proper tools. 
Here, we introduce the miRGator v3.0 that consolidated 
an extensive datasets of deep sequencing studies. The user 
interface is fully renovated with a dedicated miRNA-seq 
browser and two novel viewers that enable users to 
examine the miRNA-target relationships with expression 
correlation information readily accessible. We describe 
the main characteristics of the updated system in the fol- 
lowing sections. 

SYSTEM OVERVIEW 

The schematic overview of miRGator v3.0 is shown in 
Figure 1. We have included deep sequencing data avail- 
able in public, which have become the principal resource 
for information on miRNA diversity and expression. 
The datasets were manually curated into ontology-based 
disease and tissue categories. We have compiled 73 studies 
with 4665 samples into 38 disease and 71 anatomic 
categories. 

Major features, summarized in Figure 1, include (i) 
miR-seq browser, which allows users to examine short 
read alignment for identifying iso-miRs and differential 
expression in multiple samples; (ii) expression profiles in 
various organs, tissues and diseases, based on deep 
sequencing data; (hi) novel representation of miRNA- 
target relations in correlation heat-maps and network 
views of gene expression and (iv) gene set analysis for 
functional annotation of miRNA-associated genes. 



DATASETS AND PROCESSING OF 
SEQUENCING DATA 

We have collected 73 deep sequencing datasets on human 
samples from Gene Expression Omnibus (GEO) (21), 
Short Read Archive (SRA) (22) and The Cancer 
Genome Atlas (TCGA) archives (23). GEO and SRA 
included 54 studies of miRNA and mRNA sequencing 
(716 samples and 4.1 billion short reads). Additionally, 
we added the expression profiles of miRNAs and 
mRNAs in cancer samples from the TCGA archive (19 
studies, 3949 samples in 17 cancer types). TCGA data 
are particularly useful in investigating the inverse expres- 
sion correlation of miRNA and target mRNAs in various 
types of cancer. Note that the TCGA level 3 data include 
the processed output only, not the raw sequence data. All 
GEO/SRA experiments and TCGA data were manually 
annotated into tissue and disease types using the 
controlled vocabulary of eVOC (24) and MeSH (25), re- 
spectively. Table 1 shows the summary of datasets 
included in this update. 

The miRNA deep sequencing data were aligned to the 
reference human genome (hgl9) using the Bowtie program 
(version 0.12.7) (26) after trimming adaptor sequences 
by Cutadapt (version 1.1) (27) obtained from the 
original paper or manufacturer platform. Up to two 
mismatches were allowed in the alignment process to 
identify iso-miRs or miRNA modifications. Short reads 
mapped onto the known miRNA loci from miRBase vl8 
(1) or ncRNA region from Ensembl (release 67) (28) were 
classified as miRNA or ncRNA reads, respectively. This 
procedure yielded 1856 known miRNAs and 6424 
ncRNAs. Remaining reads were used to predict novel 
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Figure 1. System overview of miRGator v3.0. 
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Table 1. Statistics for deep sequencing data and curation result 
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miRNAs using the mirDeep2 software (29). Using the 
estimated true-positive probability of 95% and randfold 
/•-value of 0.05, we obtained 508 mature and 304 
pre-miRNA candidates. Further details of the analysis 
pipeline and program options are available in the online 
documentation. 

For quantification of miRNA abundance, we used the 
quantile normalization method for read numbers within 
each miRNA locus. Differentially expressed miRNAs 
(DEmiRs) between tumor and normal tissues were 
obtained by edgeR program (version 2.6.10) (30) after 
converting the normalized number into the nearest 
integer value. 

RNA-seq data were aligned to the human genome 
(hgl9) by the TopHat program (version 2.0.0) (31) after 
removing adaptor sequences and critical examination of 
quality controls. Cufflinks (version 1.3.0) (32) was used to 
quantify the mRNA abundance. 



miR-seq BROWSER 

miR-seq browser was specifically designed to examine the 
sequence alignment and normalized read counts with the sec- 
ondary structure information in an intuitive and interactive 
fashion. Short reads related to iso-miRs and miRNA editing 
can be readily identified with the corresponding expression 
values (read counts) in multiple samples. This feature can be 
of significant value for scientists studying biological roles of 
iso-miRs and miRNA editing. 

Figure 2 shows the screen shot of miR-seq browser. The 
secondary structure, obtained from Vienna RNA package 
(33), is displayed on the top panel and also indicated as 
different shades in the alignment window. Selecting each 
nucleotide in the secondary structure highlights corres- 
ponding nucleotide in the sequence alignment panel. 
Mismatch sequences are indicated in red color. Users 
may add, delete or edit read sequences. The read count 
table can be used to explore the variable expression of 
iso-miRs and differential miRNA processing. Expression 
level is also reflected as the background color of each cell 
in this table. We have further implemented many 



user-friendly features such as zoom-in/out, reordering of 
reads (drag & drop), sorting by expression level and save/ 
restore support of configuration. It is also possible to 
upload the user sequences in the BAM file format. 
Detailed instructions for using miR-seq browser are 
available in the online help page. 



miRNA, TARGET mRNA AND EXPRESSION 
CORRELATION 

Inferring molecular functions of miRNAs is a non-trivial 
process due to the uncertainty in relationships between 
miRNA and target mRNAs. Only small portions of 
target mRNAs are known for a limited number of 
miRNAs, and typical programs tend to yield too many 
false positives. We have compiled a variety of miRNA- 
mRNA relationships and integrated them with the expres- 
sion correlations to help users identify reliable targets 
readily. 

Validated miRNA target genes were obtained from 
miRecords (version 3), mirTarBase (version 2.5) and 
Tarbase (version 5). Predicted target relationships were 
collected from Microcosm Targets (version 5) (34), 
miRDB (version 4), miRNA.org (August 2010), PITA 
(version 6), PicTar (May 2004) and TargetScan (version 
6.2). In total, miRGator v3.0 includes 4745 validated and 
6 218 792 predicted target relations, nearly doubled from 
the previous version. 

Expression correlation is useful information to discern 
between direct and indirect targets. Inversely correlated 
expression of miRNA and putative target mRNAs is a 
strong evidence for genuine relations. We calculated the 
correlation coefficient using the deep sequencing data of 
mRNA-seq and miRNA-seq from the same sample. We 
used the Spearman rank correlation which is robust to 
different normalization methods between mRNA-seq 
and miRNA-seq data. 

Target relation and expression correlation are visually 
represented in two formats as shown in Figure 3. The 
heat-map view shows the expression correlation between 
miRNA and target mRNAs within each dataset. The 
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Figure 2. Main features of the miR-seq Browser. At the top panel, the hairpin structure of miRNA precursor is shown. The aligned short reads are 
shown together with secondary structure and read depth information in the track. By mouseover (hand icon) on a nucleotide, the corresponding 
columns are highlighted in vertical pink shadow. The reads can be sorted by the read count of each sample or the total sum on the right panel. Note 
that several read sequences show 3'-end modifications. Histogram shows the read depth at each position. Mismatched nucleotides are highlighted in 
red. Sequence editor window is opened by right-click. 



scatter plot of miRNA and mRNA expression can be dis- 
played by clicking each cell to examine sample-dependent 
variation of gene expression. The source of target infor- 
mation is also indicated to help users identify consensus 
targets, which are more likely to be genuine targets (13). 
All information on target relationship and expression cor- 
relation is downloadable in Excel format to allow more 
elaborate analysis for users. 

Network view shows the target relationship in the graph 
visualization format. Users may select the validated or 
predicted target relations, study ID of source data and 
samples. Gene expression levels or the fold changes, if 
applicable, are shown as the node color. Network view 
illustrates the target relations and expression correlations 
in more intuitive manner, but limited to display the ex- 
pression in a single study or sample. It should be noted 
that the miRNA-mRNA relation can be queried either by 
the miRNA name or by the gene name, which is a useful 
feature to investigate any synergistic effect in miRNA or 
gene function (35). 

GENE SET ANALYSIS 

Gene Set Analysis (GSA) is commonly used in interpret- 
ing a list of genes from high-throughput experiments such 
as microarray and mass spectrometry. The GSA tool of 



miRGator v3.0 enables the user to compare a list of genes 
against a priori defined gene sets such as KEGG pathway, 
Gene Ontology, the validated/predicted miRNA target 
DBs and inversely coexpressed gene sets as described in 
the previous section. The statistical significance is cal- 
culated as P- value by hypergeometric test, which is cor- 
rected for multiple tests using Bonferroni method. 



USER INTERFACE 

The miRGator v3.0 website incorporates various 
user-friendly features. Most menus are self-evident 
except the miR-seq browser for which detailed instruc- 
tions are available in the help page. Basic search can be 
performed for miRNA, disease and anatomy names. The 
search window suggests plausible keywords and supports 
the auto-complete mode. 

Search output for miRNA query consists of (i) basic in- 
formation including GeneRIF information, (ii) relevant 
studies, (hi) samples in the selected study where the link 
to miR-seq browser is available and (iv) miRNA expression 
profiles in disease, tissue and organ categories. Anatomy or 
disease queries output relevant studies and the DEmiRs 
from each study. Search in the 'miR-target & Expression 
menu' can be performed for miRNA or gene of interest, 
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Figure 3. Concurrent inspection of miRNA-mRNA target relations and expression correlation for hsa-let-7a-5p. (a) The validated and predicted 
miRNA targets are shown together with their expression correlations as heat-map. The expression values of the miRNA-target pair can be shown for 
a dataset by clicking a cell as shown in the inset picture, (b) An example of miRNA-target network visualization. The targets showing the opposite 
expression pattern to miRNA are closely placed. 



and miRNA-target information is produced with expres- 
sion correlation as explained in the previous section. 

CONCLUSION 

With the addition of deep sequencing data and implemen- 
tation of several novel tools, miRGator v3.0 continues to 
be an integrated resource of up-to-date information on 
miRNA sequences, expression profiling and target identi- 
fication. These new data and function would be valuable 
for understanding miRNA biogenesis and molecular func- 
tions. However, there are many aspects to improve. 
Regular update of inundating data is the most critical 



part since so many sequencing studies are in progress cur- 
rently including the TCGA project. We plan to update the 
data annually. Another major advancement in plan is to 
expand the scope to other organisms such as mice where 
detailed phenotype information is available via the inter- 
national mouse phenotyping consortium. 
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